189 research outputs found

    Shelling the Voronoi interface of protein-protein complexes predicts residue activity and conservation

    Get PDF
    The accurate description of protein-protein interfaces remains a challenging task. Traditional criteria, based on atomic contacts or changes in solvent accessibility, tend to over or underpredict the interface itself and cannot discriminate active from less relevant parts. A recent simulation study by Mihalek and co-authors (2007, JMB 369, 584-95) concluded that active residues tend to be `dry', that is, insulated from water fluctuations. We show that patterns of `dry' residues can, to a large extent, be predicted by a fast, parameter-free and purely geometric analysis of protein interfaces. We introduce the shelling order of Voronoi facets as a straightforward quantitative measure of an atom's depth inside an interface. We analyze the correlation between Voronoi shelling order, dryness, and conservation on a set of 54 protein-protein complexes. Residues with high shelling order tend to be dry; evolutionary conservation also correlates with dryness and shelling order but, perhaps not surprisingly, is a much less accurate predictor of either property. Voronoi shelling order thus seems a meaningful and efficient descriptor of protein interfaces. Moreover, the strong correlation with dryness suggests that water dynamics within protein interfaces may, in first approximation, be described by simple diffusion models

    Low-Complexity Nonparametric Bayesian Online Prediction with Universal Guarantees

    Full text link
    We propose a novel nonparametric online predictor for discrete labels conditioned on multivariate continuous features. The predictor is based on a feature space discretization induced by a full-fledged k-d tree with randomly picked directions and a recursive Bayesian distribution, which allows to automatically learn the most relevant feature scales characterizing the conditional distribution. We prove its pointwise universality, i.e., it achieves a normalized log loss performance asymptotically as good as the true conditional entropy of the labels given the features. The time complexity to process the nn-th sample point is O(logn)O(\log n) in probability with respect to the distribution generating the data points, whereas other exact nonparametric methods require to process all past observations. Experiments on challenging datasets show the computational and statistical efficiency of our algorithm in comparison to standard and state-of-the-art methods.Comment: Camera-ready version published in NeurIPS 201

    Probing a Continuum of Macro-molecular Assembly Models with Graph Templates of Complexes

    Get PDF
    Reconstruction by data integration is an emerging trend to reconstruct large protein assemblies, but uncertainties on the input data yield average models whose quantitative interpretation is challenging. This paper presents methods to probe fuzzy models of large assemblies against atomic resolution models of sub-systems. More precisely, consider a Toleranced Model (TOM) of a macro-molecular assembly, namely a continuum of nested shapes representing the assembly at multiple scales. Also consider a template namely an atomic resolution 3D model of a sub-system (a complex) of this assembly. We present graph-based algorithms performing a multi-scale assessment of the complexes of the TOM, by comparing the pairwise contacts which appear in the TOM against those of the template. We apply this machinery to recent average models of the Nuclear Pore Complex, and confront our observations to the latest experimental work.La reconstruction par intégration de données est une modalité émergente pour reconstruire de gros assemblages macro-moléculaires, mais les incertitudes sur les entrées donnent lieu à la génération de modèles moyens dont l'interprétation quantitative est délicate. Ce travail présente des méthodes pour comparer de tels modèles moyens à des structures de sous-systèmes connus à résolution atomique. Plus précisément, considérons un modèle tolérancé (TOM) d'un assemblage, i.e. un continuum de formes imbriquées représentant l'assemblage à diverses échelles. Considérons également un {\em template}, i.e. un modèle à résolution atomique d'un sous-système. Nous présentons des outils dérivés de la théorie des graphes, permettant de comparer les contacts entre les protéines du TOM aux contacts du template. Nous utilisons ces outils pour analyser des modèles moyens du pore nucléaire récemment produits, et discutons nos résultats à la lumière des données expérimentales les plus récentes

    Geometric, topological and contact analysis of interfaces in macro-molecular complexes: from the atomic to the complex scale using Intervor

    Get PDF
    Understanding the sociology of interactions between the proteins encoded in a genome is a central question of structural biology, and interface models between molecules forming a complex are instrumental in this perspective. Qualifying interface atoms as atoms loosing solvent accessibility in the complex, or pairs of atoms within a distance threshold, several interface models have been proposed. Yet, until recently, no interface model existed to answer coherently (if at all) the following questions: can one bridge the gap from atoms loosing solvent accessibility to interface pairs? is the interface flat or curvy? is it connected or not (does it have a multi-patch structure)? is a connected component of the interface simply connected or not (does it have a hole)? what is precisely the role played by interface structural water? Using the α\alpha-complex of the Van der Waals balls, a construction derived from the Voronoi diagram, we designed such an interface model, and validated it on the usual database of co-crystallized protein-protein complexes. This paper is a methodological paper aiming at easing the access of the interface model to structural biologists. As such, the paper overviews: (i) the geometric principles underlying the interface model (ii) the definitions of the interface and its extension to accommodate structural water (iii) the statistics one can compute from the interface model (iv) the Software Intervor and the associated web site. These presentations are accompanied by illustrations and insights on protein - protein complexes

    Robust construction of the extended three-dimensional flow complex

    Get PDF
    The Delaunay triangulation and its dual the Voronoi diagram are ubiquitous geometric complexes. From a topological standpoint, the connexion has recently been made between these constructions and the Morse theory of distance functions. In particular, algorithms have been designed to compute the flow complex induced by the distance functions to a point set. This paper develops the first complete and robust construction of the extended flow complex, which in addition of the stable manifolds of the flow complex, also features the unstable manifolds. A first difficulty comes from the interplay between the degenerate cases of Delaunay and those which are flow specific. A second class of problems comes from cascaded constructions and predicates - as opposed to the standard in-circle and orientation predicates for Delaunay. We deal with both aspects and show how to implement a complete and robust flow operator, from which the extended flow complex is easily computed. We also present experimental results

    A geometric knowledge-based coarse-grained scoring potential for structure prediction evaluation

    Get PDF
    International audienceKnowledge-based protein folding potentials have proven successful in the recent years. Based on statistics of observed interatomic distances, they generally encode pairwise contact information. In this study we present a method that derives multi-body contact potentials from measurements of surface areas using coarse-grained protein models. The measurements are made using a newly implemented geometric construction: the arrangement of circles on a sphere. This construction allows the definition of residue covering areas which are used as parameters to build functions able to distinguish native structures from decoys. These functions, encoding up to 5-body contacts are evaluated on a reference set of 66 structures and its 45000 decoys, and also on the often used lattice ssfit set from the decoys'R us database. We show that the most relevant information for discrimination resides in 2- and 3-body contacts. The potentials we have obtained can be used for evaluation of putative structural models; they could also lead to different types of structure refinement techniques that use multi-body interactions

    Assessing the Reconstruction of Macro-molecular Assemblies: the Example of the Nuclear Pore Complex

    Get PDF
    The reconstruction of large protein assemblies is a major challenge due to their plasticity and due to the flexibility of the proteins involved. An emerging trend to cope with these uncertainties consists of performing the reconstruction by integrating experimental data from several sources, a strategy recently used to propose qualitative reconstructions of the Nuclear Pore Complex. Yet, the absence of clearly identified canonical reconstructions and the lack of quantitative assessment with respect to the experimental data are detrimental to the mechanistic exploitation of the results. To leverage such reconstructions, this work proposes a modeling framework inherently accommodating uncertainties, and allowing a precise assessment of the reconstructed models. We make three contributions. First, we introduce {\em toleranced models} to accommodate the positional and conformational uncertainties of protein instances within large assemblies. A toleranced model is a continuum of geometries whose distinct topologies can be enumerated, and mining stable complexes amidst this finite set hints at important structures in the assembly. Second, we present a panoply of tools to perform a multi-scale topological, geometric, and biochemical assessment of the complexes associated to a toleranced model, at the assembly and local levels. At the assembly level, we assess the prominence of contacts and the quality of the reconstruction, in particular w.r.t symmetries. At the local level, the complexes encountered in the toleranced model are used to confirm / question / suggest protein contacts within a known 3D template known at atomic resolution. Third, we apply our machinery to the NPC for which we (i) report prominent contacts uncovering sub-complexes of the NPC, (ii) explain the closure of the two rings involving 16 copies of the YY-complex, and (iii) develop a new 3D template for the TT-complex. These contributions should prove instrumental in enhancing the reconstruction of assemblies, and in selecting the models which best comply with experimental data

    Jet fitting 3: A Generic C++ Package for Estimating the Differential Properties on Sampled Surfaces via Polynomial Fitting

    Get PDF
    International audienceSurfaces of R3 are ubiquitous in science and engineering, and estimating the local differential properties of a surface discretized as a point cloud or a triangle mesh is a central building block in Computer Graphics, Computer Aided Design, Computational Geometry, Computer Vision. %% One strategy to perform such an estimation consists of resorting to polynomial fitting, either interpolation or approximation, but this route is difficult for several reasons: choice of the coordinate system, numerical handling of the fitting problem, extraction of the differential properties. This paper presents a generic C++ software package solving these problems. %% On the theoretical side and as established in a companion paper, the interpolation and approximation methods provided achieve the best asymptotic error bounds known to date. %% On the implementation side and following state-of-the-art coding rules in Computational Geometry, genericity of the package is achieved thanks to four template classes accounting for (a) the type of the input points (b) the internal geometric computations (c) a convertion mechanism between these two geometries and (d) the linear algebra operations. An instantiation within the Computational Geometry Algorithms Library (CGAL, version 3.3) and using LAPACK is also provided

    Spectral Techniques to Explore Point Clouds in Euclidean Space, with Applications to Collective Coordinates in Structural Biology

    Get PDF
    International audienceLife sciences, engineering, or telecommunications provide numerous systems whose description requires a large number of variables. Developing insights into such systems, forecasting their evolution, or monitoring them is often based on the inference of correlations between these variables. Given a collection of points describing states of the system, questions such as inferring the effective number of independent parameters of the system (its intrinsic dimensionality) and the way these are coupled are paramount to develop models. In this context, this paper makes two contributions. First, we review recent work on spectral techniques to organize point clouds in Euclidean space, with emphasis on the main difficulties faced. Second, after a careful presentation of the bio-physical context, we present applications of dimensionality reduction techniques to a core problem in structural biology, namely protein folding. Both from the computer science and the structural biology perspective, we expect this survey to shed new light on the importance of non linear computational geometry in geometric data analysis in general, and for protein folding in particular

    Multi-scale Geometric Modeling of Ambiguous Shapes with Toleranced Balls and Compoundly Weighted alpha-shapes

    Get PDF
    Dealing with ambiguous data is a challenge in Science in general and geometry processing in particular. One route of choice to extract information from such data consists of replacing the ambiguous input by a continuum, typically a one-parameter family, so as to mine stable geometric and topological features within this family. This work follows this spirit and introduces a novel framework to handle 3D ambiguous geometric data which are naturally modeled by balls. First, we introduce {\em toleranced balls} to model ambiguous geometric objects. A toleranced ball consists of two concentric balls, and interpolating between their radii provides a way to explore a range of possible geometries. We propose to model an ambiguous shape by a collection of toleranced balls, and show that the aforementioned radius interpolation is tantamount to the growth process associated with an additively-multiplicatively weighted Voronoi diagram (also called compoundly weighted or CW). Second and third, we investigate properties of the CW diagram and the associated CW α\alpha-complex, which provides a filtration called the λ\lambda-complex. Fourth, we propose a naive algorithm to compute the CW VD. Finally, we use the λ\lambda-complex to assess the quality of models of large protein assemblies, as these models inherently feature ambiguities
    corecore